Lecture 02: Cause and Effect¶
Associated Textbook Sections: 2.0, 2.1, 2.2, 2.3. 2.4, 2.5
Overview¶
Associations¶
Headline¶
📢 "Regularly Eating Chocolate Is Linked to 8 Percent Lower Heart Attack Risk" - everydayhealth.com
A Headline Source¶
- Headlines are created to capture our attention.
- Sometimes headlines are created from actual studies, and sometimes they are not.
- This headline has a study from the European Journal of Preventive Cardiology as the source.
Study Observations¶
- Meta-Study: Combines the results from several studies
- Individuals: Study subjects, participants, units, etc.
- 336,289 US, Swedish, and Australian adults from several studies
- Treatment: A specific intervention or condition that is applied to individuals
- Chocolate consumption
- Outcome: A variable or event that researchers measure or observe to assess the effects of the treatment(s)
- Coronary heart disease risk
An Initial Question¶
Is there an association between chocolate consumption and heart disease risk?
Yes, the reviewed article in the European Journal of Preventive Cardiology concludes that those consumed chocolate more than 1 time per week or more than 3.5 times per month were associated with fewer cases of heart disease compared with those that didn't.
A Follow Up Question¶
Does chocolate consumption lead to a reduction in heart disease?
This question is often harder to answer and refers to the concept of causality.
No, there are several factors that could explain why fewer people that consumed chocolate regularly developed heart disease. For example, better health care access could explain financial freedom to consume more foods like chocolate and explain less cases of heart disease.
“Dr. Alice Lichtenstein, an American Heart Association volunteer and professor of nutrition science and policy at Tufts University, was more skeptical of the findings.”
Association and Causality¶
Does the treatment affect the outcome?
- Association: Any relationship between treatment and outcome
- Causality: Treatment caused the outcome
Two Study Designs¶
At a basic level, we will consider two types of study designs for collecting and analyzing data:
- Observational Studies: Observing outcomes without intervening or manipulating variables.
- Experiments: Manipulating variables to observe the effect on outcomes.
There are several sub-categories of observational studies and experiments that you will encounter in this class.
At a high-level, the chocolate study is an observational study and we can only conclude with associations.
Beauty Sleep¶
An Experimental Study¶
- Question: Are sleep-deprived people perceived as less healthy, less attractive and more tired than those who get a normal night's sleep?
- Background:
- Sleep laboratory in Stockholm, Sweden
- 23 healthy, sleep-deprived adults (age 18-31) who were photographed twice
- After 8 hours of sleep
- After 31 hours of wakefulness after a night of reduced sleep
- 65 untrained observers (age 18-61) who rated the photographs
- Photographs were presented in a randomized order
- Publication: https://www.bmj.com/content/341/bmj.c6614
Comparison¶
- Treatment group:
- Receives the experimental treatment or intervention being tested.
- The goal is to assess the impact or effectiveness of the treatment.
- Control Group:
- This group does not receive the experimental treatment.
- It serves as a baseline or reference group to which the treatment group is compared.
- Utilizes no treatment, placebo treatments, treatments with well-understood outcomes, etc.
Results¶
- Outcome: Difference in perceived health, attractiveness, and tiredness between sleep-deprived (treatment) and well-rested (control) participants
- Results: Sleep-deprived people were rated as less healthy, more tired, and less attractive
Causation?¶
- Can we say that sleeping more causes you to be perceived as more healthy, more attractive, and less tired?
- No. Something is missing in this study design.
Cholera in London¶
London, Early 1850’s¶
- Many people were dying from a violent, rapid loss of fluids
- At this time, experts believed bad smell (miasma) was the main source of disease
- Florence Nightingale
- Origins of phrases like: "fly to clene air", "a pocket full o'posies", etc.
John Snow¶
- An English physician
- Against miasma theory
- Collected data on where people were dying in London
- Believed water to be the source of the disease, not smell.
Cholera Map¶
According to the National Geographic Society,
"This map of London was created by John Snow in 1854. London was experiencing a deadly cholera epidemic, when Snow tracked the cases on this map. The cholera cases are highlighted in black. Using this map, Snow and other scientists were able to trace the cholera outbreak to a single infected water pump."
London Water Supply Service Regions¶
In the image:
- Blue area - Southwark and Vauxhall (S&V) Company (sourced after passing through London)
- Red area - Lambeth Company (sourced before entering London)
- Purple area - Both Companies are intermingled
Forming an Argument¶
John Snow's Data¶
from datascience import *
import numpy as np
snows_table = Table(['Supply Area', 'Number of Houses', 'Cholera Deaths']).with_rows([
['S&V', 40046, 1263],
['Lambeth', 26107, 98],
['Rest of London', 256423, 1422]
])
snows_table
| Supply Area | Number of Houses | Cholera Deaths |
|---|---|---|
| S&V | 40046 | 1263 |
| Lambeth | 26107 | 98 |
| Rest of London | 256423 | 1422 |
Relative Frequency¶
- Comparing counts (frequencies) from groups of different sizes can be misleading
- Relative frequencies are expressed as the counts of each group divided by the total number of values within each group
Demo: Calculating Relative Frequency¶
To compare the deaths totals in various supply areas, calculate the relative frequency of deaths per household (Deaths per House).
number_of_deaths = snows_table.column('Cholera Deaths')
number_of_houses = snows_table.column('Number of Houses')
death_per_house = number_of_deaths / number_of_houses #...
snows_table.with_column('Deaths per House', death_per_house)
| Supply Area | Number of Houses | Cholera Deaths | Deaths per House |
|---|---|---|---|
| S&V | 40046 | 1263 | 0.0315387 |
| Lambeth | 26107 | 98 | 0.00375378 |
| Rest of London | 256423 | 1422 | 0.00554552 |
Scale for Readability¶
- It is easy to get lost in comparing fractional values like 0.0315387 and 0.00375378.
- Scaling rates by multiplying the values by some scale factor is a common presentation technique.
- The clarity may come with exaggerated interpretations!
Demo: Adjusting Snow's Table¶
Scale and round the relative frequencies to show whole numbers.
scale_factor = 10000 # ...
deaths_per_10000_houses = death_per_house * scale_factor
snows_table.with_column('Deaths per 10,000 Houses',
np.round(deaths_per_10000_houses))
| Supply Area | Number of Houses | Cholera Deaths | Deaths per 10,000 Houses |
|---|---|---|---|
| S&V | 40046 | 1263 | 315 |
| Lambeth | 26107 | 98 | 38 |
| Rest of London | 256423 | 1422 | 55 |
Confounding Variables¶
Confounding Factors Weaken a Causal Argument¶
- If the treatment and control groups have systematic differences other than the treatment, then it might be difficult to identify causality.
- Such differences are often present in observational studies.
- When they lead researchers astray, they are called confounding factors.
Attempting to Establish Causation¶
- Snow's data suggests they were more likely to die if they drank water sourced from S&V.
- Skeptics might criticize things like age, socioeconomic status, etc.
- The individuals in the (purple) region were similar in most demographics aside from their water source.
- There was a somewhat haphazard way that people were assigned to the two treatments (water supplies).
- He felt this "natural experiment" (an observational study) ruled out enough of the confounding factors and broke off the water pump handle at Broad Street (S&V-sourced).
Randomized Controlled Experiments¶
Randomized Controlled Experiment¶
Randomized controlled experiments are typically the standard study design for establishing causation.
- Randomization: Randomly assigning individuals to a specific treatment
- You can (mathematically) account for variability in the assignment.
- Regardless of what the dictionary says, in probability theory
- Random ≠ Haphazard
- Random refers to the concept of unpredictability
- Controlled: At least one treatment is a control.
If you assign individuals to treatment and control at random, then the two groups are likely to be similar apart from the treatment.
A Randomized Controlled Experiment¶
An experimental study on the effect of an experimental drug on treating Leukemia:
- An example of a modern randomized controlled experiment
- Treatment: 1 to 2 cycles of induction therapy with CPX-351 - an experimental drug
- Control: Conventional 7+3 - a standard of care chemotherapy
- Randomization: After patients were divided into smaller groups based on their ages, they were randomly assigned to the treatment or control.
Ethical Concerns¶
Sometimes randomized controlled experiments are considered unethical
- Assigning a patient something other than a known effective treatment.
- The participants might be from populations that are not able to fully consent to treatment.
- The treatments might be risky. (Correct dosages might not be fully understood.)
- etc.
Identifying and responding to ethical concerns is and should be a common part of a researchers work.
Reflecting on John Snow's Natural Experiment¶
John Snow's story has some issues:
- Falsely relies on the idea that water was distributed randomly to a certain area of London
- The area studied was very dense, so it is not possible to rule out all systemic factors
- Removing the pump handle prematurely should raise ethical concerns
- Ultimately, the way our body responds to the cholera bacteria, not the pump